開會的時候,是有可能不留下會議記錄的,
當會議做出了錯誤的決定,造成了破口,
就很難追究責任,甚至當一切好像沒事發生一樣。
因此,這裡我們使用了GCP的Speech-to-Text功能,
啟動該API之後我們可以試著本地端使用該功能:
安裝
pip install google-cloud-speech
本地端使用
import os
credential_path = "cred.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path
def transcribe_file(speech_file):
"""Transcribe the given audio file."""
from google.cloud import speech
import io
client = speech.SpeechClient()
with io.open(speech_file, "rb") as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code="en-US",
)
response = client.recognize(config=config, audio=audio)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u"Transcript: {}".format(result.alternatives[0].transcript))
transcribe_file("speach.wav")
我們可以在最後一行發現我們將本地端的wav錄音檔轉換成文字,
而中間config language_code的部分,
我們可以從
https://cloud.google.com/speech-to-text/docs/languages
尋找支援的語言(像是繁體中文zh-TW),
而sample_rate_hertz會在第一次執行之後告訴你該錄音檔的頻率為多少,
是可能需要做調整才能正確執行程式。
而如果使用雲端儲存空間google-cloud-storage,官網也有提供範例:
# Imports the Google Cloud client library
from google.cloud import speech
import os
credential_path = "cred.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path
# Instantiates a client
client = speech.SpeechClient()
# The name of the audio file to transcribe
gcs_uri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw"
audio = speech.RecognitionAudio(uri=gcs_uri)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
# zh-TW
# Detects speech in the audio file
response = client.recognize(config=config, audio=audio)
for result in response.results:
print("Transcript: {}".format(result.alternatives[0].transcript))
可以發現,主要就是audio = speech.RecognitionAudio()的參數,
由content換成uri。
價格的話,每個月前一個小時免費,之後翻譯一個小時大約45元的台幣。